Assessing agreement on classification tasks: the kappa statistic

نویسنده

  • Jean Carletta
چکیده

Currently, computational linguists and cognitive scientists working in the area of discourse and dialogue argue that their subjective judgments are reliable using several different statistics, none of which are easily interpretable or comparable to each other. Meanwhile, researchers in content analysis have already experienced the same difficulties and come up with a solution in the kappa statistic. We discuss what is wrong with reliability measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better off as a field adopting techniques from content analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diagnostic concordance among dermatopathologists in basal cell carcinoma subtyping: Results of a study in a skin referral hospital in Tehran, Iran

Background: Basal cell carcinomas (BCC) are the most prevalent among non-melanoma skin cancers (NMSC), which correspond to the most common skin cancers. BCC histopathological subtyping is a problem in therapeutic management. Therefore, we have decided to perform a histopathologic study for better classification of BCCs based on interobserver diagnostic judgment. Methods: We conducted this cross...

متن کامل

Analyzing Appraisal Automatically

AAAI 2004 Spring Symposium Exploring Attitude and Affect in Text Department of Linguistics Simon Fraser University Burnaby, B.C., V5A 1S6, Canada Carletta, J. 1996. Assessing agreement on classification tasks: the kappa statistic. 22 (2): 249-154. Hatzivassiloglou, V., and McKeown, K. 1997. Predicting the semantic orientation of adjectives. In , 174-181. Krippendorf, K. 1980. Beverly Hills, CA:...

متن کامل

On population-based measures of agreement

Measuring agreement between qualified experts is commonly used to determine the effectiveness of a diagnostic procedure. Many methods are available for assessing agreement, including Cohen’s kappa, which is a very popular summary measure of agreement due to its appealingly simple usage and interpretation. However, it has been previously shown that a number of flaws exist in its usage, which can...

متن کامل

Assessing Agreement on Classiication Tasks: the Kappa Statistic

Currently, computational linguists and cognitive scientists working in the area of discourse and dialogue argue that their subjective judgments are reliable using several diierent statistics, none of which are easily interpretable or comparable to each other. Meanwhile, researchers in content analysis have already experienced the same diiculties and come up with a solution in the kappa statisti...

متن کامل

A model and measure of agreement for population-based studies

Agreement between physicians in their classification of items such as mammograms for the presence of disease is an important tool in assessing the reliability of a diagnostic procedure, and the modeling of agreement data is a popular topic in the biomedical and social sciences. Interest often lies in assessing agreement in the underlying diagnostic procedure and making inferences for the popula...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 22  شماره 

صفحات  -

تاریخ انتشار 1996